Combining Optimal Clustering And Hidden Markov Models For Extractive Summarization

نویسندگان

Pascale Fung

Grace Ngai

Chi-Shun Cheung

چکیده

We propose Hidden Markov models with unsupervised training for extractive summarization. Extractive summarization selects salient sentences from documents to be included in a summary. Unsupervised clustering combined with heuristics is a popular approach because no annotated data is required. However, conventional clustering methods such as K-means do not take text cohesion into consideration. Probabilistic methods are more rigorous and robust, but they usually require supervised training with annotated data. Our method incorporates unsupervised training with clustering, into a probabilistic framework. Clustering is done by modified K-means (MKM)--a method that yields more optimal clusters than the conventional K-means method. Text cohesion is modeled by the transition probabilities of an HMM, and term distribution is modeled by the emission probabilities. The final decoding process tags sentences in a text with theme class labels. Parameter training is carried out by the segmental K-means (SKM) algorithm. The output of our system can be used to extract salient sentences for summaries, or used for topic detection. Content-based evaluation shows that our method outperforms an existing extractive summarizer by 22.8% in terms of relative similarity, and outperforms a baseline summarizer that selects the top N sentences as salient sentences by 46.3%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instructions for use Title Rhetorical Structure Modeling for Lecture Speech Summarization

We propose an extractive summarization system with a novel non-generative probabilistic framework for speech summarization. One of the most under-utilized features in extractive summarization is rhetorical information -semantically cohesive units that are hidden in spoken documents. We propose Rhetorical-State Hidden Markov Models (RSHMMs) to automatically decode this underlying structure in sp...

متن کامل

Rhetorical Structure Modeling for Lecture Speech Summarization

متن کامل

Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models

The purpose of extractive summarization is to automatically select indicative sentences, passages, or paragraphs from an original document according to a certain target summarization ratio, and then sequence them to form a concise summary. In this paper, in contrast to conventional approaches, our objective is to deal with the extractive summarization problem under a probabilistic modeling fram...

متن کامل

Learning to Model Domain-Specific Utterance Sequences for Extractive Summarization of Contact Center Dialogues

This paper proposes a novel extractive summarization method for contact center dialogues. We use a particular type of hidden Markov model (HMM) called Class Speaker HMM (CSHMM), which processes operator/caller utterance sequences of multiple domains simultaneously to model domain-specific utterance sequences and common (domainwide) sequences at the same time. We applied the CSHMM to call summar...

متن کامل

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from unannotated documents, utilizing a novel adaptation of algorithms for Hidden Markov Models. We then apply our method to two complement...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Combining Optimal Clustering And Hidden Markov Models For Extractive Summarization

نویسندگان

چکیده

منابع مشابه

Instructions for use Title Rhetorical Structure Modeling for Lecture Speech Summarization

Rhetorical Structure Modeling for Lecture Speech Summarization

Extractive Chinese Spoken Document Summarization Using Probabilistic Ranking Models

Learning to Model Domain-Specific Utterance Sequences for Extractive Summarization of Contact Center Dialogues

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

عنوان ژورنال:

اشتراک گذاری